Learning Domain-Specific, L1-Specific Measures of Word Readability

نویسندگان

  • Shane Bergsma
  • David Yarowsky
چکیده

Improved readability ratings for second-language readers could have a huge impact in areas such as education, advertising, and information retrieval. We propose ways to adapt readability measures for users who (a) are proficient in a particular domain, and (b) have a particular native language (L1). Specifically, we predict the readability of individual words. Our learned models use a range of creative features based on diverse statistical, etymological, lexical, and morphological information. We evaluate on a corpus of computational linguistics articles divided according to seven L1s; we show that we can accurately predict the target readability scores in this domain. Our technique improves over several reasonable baselines. We provide an in-depth analysis showing which kinds of information are most predictive of word difficulty in different L1s, and show how this differs for style and content words. RÉSUMÉ. Une amélioration au niveau de la lisibilité linguistique pour les lecteurs de langue seconde pourrait avoir un impact énorme dans les domaines tels que celui de l’éducation, la publicité et des recherches d’information. Nous proposons des moyens d’adapter des mesures de lisibilité pour des utilisateurs qui (a) sont compétents dans un domaine particulier, et (b) ont une langue maternelle spécifique (L1). Plus précisément, nous prévoyons la lisibilité linguistique de mots individuels. Nos fonctions de prédiction utilisent une gamme de caractéristiques basée sur différentes informations statistiques, étymologique, lexicales et morphologiques. En évaluant sur un corpus d’articles en linguistique informatique répartis en sept L1s, nous démontrons que nous pouvons prédire avec précision une cible du niveau de lisibilité dans ce domaine. Nous fournissons une analyse en profondeur démontrant quels types d’informations sont plus prédictifs de la difficulté des mots dans différentes L1s. De plus, nous démontrons comment ceci diffère pour les mots de contenu et les mots grammaticaux.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Effectiveness of Cognitive Rehabilitation on Mathematical Word Problem Solving in Students with Specefic Learning Disability with Impairment in Mathematic

The aim of the current reaerch was to study the effectiveness of cognitive rehabilitation on mathematical word problem solving in students with specific learning disabilitiy with impairment in mathematic. The research design was quasi-experimental pre-test and post-test with a control group. The statistical population of the study included all male students with specefic learning disabilitiy wi...

متن کامل

Approach to Word Problem Solving in Students with Specific Learning Disorders: A Review Study

Background and Objective: Today, different teaching approaches have been offered to solve word problems. Schema-based instruction is one of these new approaches. This study aimed to identify and determine the nature, stages, research evidence, and effectiveness of schema-based instruction on resolving students’ word problems. Materials and Methods: This is a review study. Studies and resources...

متن کامل

N-gram Fragment Sequence Based Unsupervised Domain-Specific Document Readability

Traditional general readability methods tend to underperform in domain-specific document retrieval because they fail to effectively differentiate the reading difficulty of the individual domain-specific terms and the semantic associations between the textual units in a document. On the other hand, recently proposed domain-specific readability methods have relied upon an external knowledge base ...

متن کامل

EFL Textbook Evaluation: An Analysis of Readability and Vocabulary Profiler of Four Corners Book Series

This study aimed to investigate whether there is any significant relationship between the readability and vocabulary profile including the most frequent words (K1 words) and academic word list (AWL) of reading passages of Four Corners series which were EFL textbooks. To determine the readability of the texts, the Flesch–Kincaid (1975) readability test was used, while the texts' academic word li...

متن کامل

EFL Textbook Evaluation: An Analysis of Readability and Vocabulary Profiler of Four Corners Book Series

This study aimed to investigate whether there is any significant relationship between the readability and vocabulary profile including the most frequent words (K1 words) and academic word list (AWL) of reading passages of Four Corners series which were EFL textbooks. To determine the readability of the texts, the Flesch–Kincaid (1975) readability test was used, while the texts' academic word li...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • TAL

دوره 54  شماره 

صفحات  -

تاریخ انتشار 2013